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Abstract 

The emission rate of minority atmospheric gases is inferred by a new approach 
based on neural networks. The new network applied is the multi-layer per- 
ceptron with backpropagation algorithm for learning. The identification of 
these surface fluxes is an inverse problem. A comparison between the new 
neural-inversion and regularized inverse solutions is performed. The results 
obtained from the neural networks are significantly better. In addition, the 
inversion with the neural networks is faster than regularized approaches, after 
training. 

Key words: Neural networks; inverse problems; surface emission rate of 
atmospheric gases. 



1. Introduction 

The enhancing of the concentration of greenhouse effect gases is a central 
issue nowadays, meanly regarding the most important anthropogenic gases, 
such as methane (CH4) and carbon dioxide (CO2). Despite the ratification 
of the Kyoto Protocol, the forecast is that the releases of CO2 and CH4 in 
the atmosphere continue to increase in next decade [16]. 

One mandatory strategy is to monitoring the concentration of these gases 
in the atmosphere. However, in order to understand the bio-geochemical 
cycle of these gases, it is necessary to estimate the surface emission rates. 
One procedure for this is to employ inverse problem methodology. 
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The method of inverse problem is an efficient way to scientifically esti- 
mate the intensity of pollution sources. Various inverse problem methods 
are being investigated by the international scientific community [8, 25, 26]. 
In order to deal with the ill-posed characteristic of inverse problems, regu- 
larized solutions [4, 28] and also regularized iterative solutions [1, 5] have 
been proposed. More recently, artificial neural networks are also employed 
to solve inverse problems [15, 31, 27]. The pollutant source identification is 
an inverse problem, and neural networks have been applied for identifying 
the emission intensity of point sources [17, 20, 12, 21, 30]. 

In this paper, a new approach using multilayer perceptron artificial neural 
network (MLP-ANN) is employed to estimate the rate of surface emission of 
a pollutant. The input for the ANN is the gas concentration measured on a 
set of points. The methodology is tested using synthetic experimental data, 
obtained by running an atmospheric pollutant dispersion model: LAMBDA 
[10, 11]. The Lambda model is a Lagrangian model. 

Finally, the surface rate estimated with MLP-ANN is compared with 
regularized inversion by maximum entropy principle. In the latter method, 
the inverse problem is formulated as an optimization problem that could be 
solved using a deterministic or a stochastic optimization procedure. 

2. Forward Model 

The Lagrangian particle model LAMBDA was developed to study the 
transport process and pollutants diffusion, starting from the Brownian ran- 
dom walk modeling [11, 9]. In the LAMBDA code, full-uncoupled particle 
movements are assumed. Therefore, each particle trajectory can be described 
by the generalized three dimensional form of the Langevin equation for ve- 
locity [29]: 



where i, j = 1, 2, 3, and x is the displacement vector, U is the mean wind ve- 
locity vector, u is the Lagrangian velocity vector, a« (x, u, t) is a deterministic 



dui = cii (x, u, t) dt + bij (x, u, t) dWj{t) 



(1) 



c/x = (U + u) dt 



(2) 



2 



term and 6^ (x, u, t) dWj(t) is a stochastic term and the quantity dWj(t) is 
the incremental Wiener process. 

The determinisitc (drift) coefficient a% (x, u, t) is computed using a partic- 
ular solution of the Fokker-Planck equation associated to the Langevin equa- 
tion. The diffusion coefficient 6^- (x, u, t) is obtained from the Lagrangian 
structure function in the inertial subrange {t k « At « r L ), where t k is 
the Kolmogorov time scale and tl is the Lagrangian de-correlation time scale. 
These parameters can be obtained employingthe Taylor statisitcal theory on 
turbulence [32]. 

Backward integration can also be applied. This is just to identify which 
particle arriving in a sensor- j is coming from a source-i. 

The drift coefficient, a« fx,u,tY for forward and backward integration 

is given by 
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and 
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where c v — 1 for forward integration and c v — — 1 for backward integra- 
tion, P E = P ^x, u,tj is the non-conditional PDF of the Eurelian celocity 

fluctuations, and B^j = \bi^bj^- 

Of course, for backward integration, the time considered is: t' = —t, and 
velocity U' = — U, being U the mean wind speed. The horizontal PDFs are 
considered Gaussians, and for the vertical coordinate the truncated Gram- 
Charlier type-C of third order is employed [33]. 

The diffusion coeffiecients, 6ij (x, u, t), for both forward and backward 
integration is given by 
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1/2 

(6) 

where 5ij is teh Kronecker delta, of and tu are velocity variance at each 
component and the Lagrangian time scale [32], respectively. With the coor- 
dinates and the mass of each particle, the concentration is computed - see 
equations (5) and (6). 

The inverse problem here is to identify the source term S(t). As men- 
tioned, a source-receptor approach is employed for reducing the computer 
time, instead of running the direct model (equation 2) for each iteration. 
This approach displays an explicit relation between the pollutant concentra- 
tion of the i-th receptor related the j-th sourcers: 
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where the matrix Mjj is the transition matrix, and matrix entry given by 




Nrj j (forward) 

M ij = \ y%{\" s>3 ' (8) 

x ' ( backward) 

where Vr, i and Vsj are the volumr for the i-th receptor and j-th source, 
respectively; N S j and N R ^ are the number of particle realised by the j- 
th source and i-th sensor, respectively; Nr^j and N s ^j are the number of 
particle released by the j-th source and detected by the i-th receptor. 



3. Inverse Method: Neural Network 

An artificial neural network (ANN) is an interconnected group of artifi- 
cial neurons, elements of networks that uses a mathematical or computational 
model for information processing based on a connectionist approach to com- 
putation. Inputs and outputs to a neuron consist of values (xx,X2, ■ ■ ■ , x n ) 
and (yi, y 2 , ■ ■ ■ ,y n )- The neuron computes the weighted sum of its inputs, 
adding a bias, and the result is an argument for a non-linear activation func- 
tion. The MLP is composed of multiple processing units called artificial 
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neurons (or nodes) arranged in several different layers [2] . The configuration 
of the best MLP model includes choosing the number of layers (typically, it 
requires at least three: input, hidden, and output), the number of neurons 
in hidden layer (how many units should be in the input and output layers is 
defined by the problem), the activation function, and the learning algorithm. 
After the proper architecture of the MLP has been established, all the train- 
ing cases are run through the network. In each neuron a linear combination 
of the weighted inputs (including a bias) is computed, summed and trans- 
formed using a transfer function (linear or nonlinear). The value obtained is 
passed on as an input to the neurons in the subsequent layer until a value is 
computed in neurons of the output layer. The output values are compared 
with the target values. The difference between the output and target is cal- 
culated for each output neuron using a certain error function in order to give 
the prediction error made by the network. Then, the training algorithm is 
used to adjust the network's weights and thresholds in order to minimize this 
error. Because a target value is compared to the output value, the learning 
process is called supervised [14]. In this study a linear transfer function was 
used in input neurons, and the log-sigmoid function for the neurons located 
in hidden and output layers. The error function was the sum-squared error, 
where the individual errors of output units on each case were squared and 
summed together. The networks were trained using the back-propagation 
algorithm [2]. The correctionof the weights Aw was calculated according to 
the following formula: Awij = r/djOi + aAwjf d \ in wich j is the index of 
the neuron in the current layer, the neuron in the upper layer is indexed by 
i and its output by Oi, and the local error gradient is denoted by 5j. There 
are two constants in this formula - the learning rate i], wich determines to 
what extent the weights should be modified, and the momentum coefficient 
a, which decides to what degree this previous adjustment is to be considered 
so as to prevent any sudden changes in the direction in which corrections 
are made. The learning rate 7] and momentum a were set to 0.1 and 0.5, 
respectively. They can be used to model complex relationships between in- 
puts and outputs or to find patterns in data. In this paper, we used the 
neural network MLP feedforward backpropagation network is the multilayer 
perceptron (MLP) with backpropagation learning. 

Regardless their type or use, all neural networks have three stages in their 
application: the learning, the activation and the generalization steps. It is in 
the learning step that the weights and bias corresponding to each connection 
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are adjusted to some reference examples (the input). In the activation phase, 
the output is obtained based on the weights and bias computed in the learning 
phase [14]. The transfer function selected was a tan-sigmoid transfer function 
for the hidden layers and linear transfer function for the output layer. The 
experimental data used here in the learning step were simulated adding a 
random perturbation to the exact solution for forward problem (LAMBDA): 

I = I exact (1 + CTfl) (9) 

where a is the noise standard deviation and /i is a random variable taken 
from a Gaussian distribution with zero mean and unitary variance. In all 
simulations we used 5 = 0.05 and 5 = 0.10. 

Overall, more than fifty pairs of rates of emission of pollutants and their 
concentrations are necessary for the process of inversion. Similar data sets 
were used for the stages of activation and the general ANN. 

4. Results 

The area of the numerical experiment used is the same employed by 
Roberti (2005) [22] and Luz (2007) [18]. The domain is divided into 25 
sub-domains, where each cell has the size: 300 m (width) x 200 m (length) - 
volume of each cell is 60.000.000 m 3 (height = 1000 m) [22]. Figure 1 shows 
the different subdomains of emissions of contaminants. In this figure, there 
are sensors, where their positions are represented by • in the area of study. 

Six sensors are used inside the domain, and the sensor size is a small 
volume: 0.1 m x 0.1 m x 0.1 m, positioned at a height of 10 meters, installed 
in the area, according to Table Table 1. 

Table 2 shows the meteorological data used by the LAMBDA model to 
simulate the dispersion of particles, taken from the Copenhagen experiment 
[22, 10]. Meteorological data are speeds and direction averages of wind, mea- 
sured at three levels, for five different time periods measured on 19/10/1978 
[22]. 

The results obtained with noiseless data are in excellent agreement with 
the true model. The MLP network produced good estimation of the rate 
of emission of pollutants compared to the Quasi-Newton method [22] and 
Particle Swarm Optimization (PSO) [18]. The best results for noisy data 
were obtained for the MLP network with 15 and 30 neurons in hidden layer 
and 5 = 0.05. 
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In order to analyze the performance of the ANNs for the estimation of the 
rate of gas emission, two experiments were performed. In the first experiment 
5% of white Gaussian noise (5 = 0.05) was added to the synthetic experi- 
mental data. In the second one, 10% of noise is used to simulate the real 
experimental data. All ANNs were trained with two hidden layers, varying 
the number of hidden neurons and the database training. Several tests were 
carried out to reach in a good neural network architecture. For all configura- 
tions, two hidden layers were considered: (a) ANN-1: 6 and 12, (b) ANN-2: 7 
and 8, (c) ANN-3: 15 and 30 neurons in hidden layers. The ANN-3 obtained 
the best result compared with the model true. Figure 2 and Figure 3, the 
topology of the RNA is represented where x%: number 

of neurons in input layer, X2- number of neurons in the 1st hidden layer, x^: 
number of neurons in the 2nd hidden layer and X4. Number of neurons in 
output layer. 

The training phase was carried out until a maximum number of iterations 
were reached. Table 3 shows the exact results (LAMBDA), the results ob- 
tained with regularized inversion (the optimization problem solved by quasi- 
Newton (Q-N, deterministic) [22] and particle swarm optimization (PSO, 
stochastic) [18] schemes), and the results obtained with ANN for 5 = 0.05. 
The Table 4 shows the same results, but considering 5 = 0.10. 

Figures Figure 2 and Figure 3 shows the results of the generalization tests 
in comparison with the true model. 

5. Conclusion 

The problem for identifying the minority gas emission rate for the system 
ground-atmosphere is an important issue for the bio-geochemical cycle, and 
it has being intensively investigated. This inverse problem has been solved 
using regularized solutions [25], Bayes estimation [8, 13], and variational 
methods [7] (this is approach started from the data assimilation studies). 

Our previous studies were initiated using generalized least square scheme, 
with entropic regularization [22, 23, 18, 19] (for the use of maximum entropy 
principle on this issue, see also [3, 6]). 

ANNs were used as effective tool for solving the inverse problem of esti- 
mation of the rate of gas surface emission. The obtained reconstructions with 
the MLP showed to be better than the obtained with regularization methods 
[22, 18]. Another advantage of the use of neural network is, after the training 
phase, the reconstruction algorithm is faster than the regularized inversion 
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methods. Neural inversion is unique scheme that does not need a solution of 
the associated forward problem. 

In practice, operational inversion algorithms reduce the risk of being 
trapped in local minima by starting the iterative search process from an 
initial guess solution that is sufficiently close to the true profile. However, 
the dependence of the final solution on a good choice of the initial guess rep- 
resents a fundamental weakness of such algorithms, particularly in regions 
where less a priori information is available [17]. ANN approaches can relax 
this constraint incorporating more data in the dataset during the learning 
phase. 

The ANNs can be inaccurate if they are used to extrapolate to cases 
outside the training domain. However the use of ANN techniques can pro- 
vide good solutions when the training phase encompasses the domain of the 
potential solutions to the real problem. 
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Tabic 1: Position of the sensors in the area. 



Sensor 
1 
2 
3 



Position x (m) 

400 

600 

800 
1000 
1200 
1400 



Position y (m) 
500 
300 
700 
500 
300 
700 
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Table 2: Meteorological data used in the experiment Copenhagen. 



Time 
(h:m) 


Speed U (m/s) 
10 m 120 m 200 m 


Direction U () 
10 m 120 m 200 m 


12:05 


2,6 


5,7 


5,7 


290 


310 


310 


12:15 


2,6 


5,1 


5,7 


300 


310 


310 


12:25 


2,1 


4,6 


5,1 


280 


310 


320 


12:35 j 


2,1 


4,6 


5,1 


280 


310 


320 


12:45 


2,6 


5,1 


5,7 


290 


310 


310 



Table 3: Results of estimation of the rate o gas emission (gm 3 s 1 ), for noise level a = 



0.05. 



cell 


Exact 


Q-N (jm-^s- 1 ) 


PSO (gm-'s- 1 ) 


ANN 6:6:12:12 


ANN 6:15:30:12 




(gm- 3 s- r ) 


(Robcrti, 2005) 


(Luz, 2007) 


(gm^s- 1 ) 


(gm- 3 s- r ) 


A 2 


10 


9,82 


09,34 


09,92 


09,79 


At 


10 


9,63 


10,07 


09,83 


09,74 


A 4 


10 


11,26 


11,26 


09,86 


04,81 


A 7 


10 


8,76 


10,95 


09,82 


09,80 


As 


10 


11,06 


10,93 


09,71 


09,73 


A 9 


10 


15,51 


14,99 


09,71 


09,79 


A l2 


20 


20,12 


20,79 


20,94 


20,61 


A 13 


20 


19,25 


19,83 


20,81 


20,61 


An 


20 


11,52 


13,06 


20,93 


20,60 


An 


20 


17,88 


18,72 


20,94 


20,60 


A 18 


20 


23,82 


22,76 


20,96 


20,61 


Al9 


20 


23,44 


22,47 


20,77 


20,59 



Table 4: Results of estimation of the rate o gas emission (gm 3 s 1 ), for noise level a = 



0.10. 



cell 


Exact 


Q-N (gm-' A s- L ) 


PSO (gm-'^s- 1 ) 


ANN 6:6:12:12 


ANN 6:15:30:12 




(gm~ 3 s~ 1 ) 


(Roberti, 2005) 


(Luz, 2007) 




(gm^ 3 s^ 1 ) 


A 2 


10 


8,97 


09,83 


10,33 


10,11 


A 3 


10 


9,97 


10,40 


10,11 


10,11 


A 4 


10 


12,52 


10,79 


10,12 


10,20 


A T 


10 


7,98 


10,50 


10,27 


10,20 


As 


10 


10,14 


12,06 


10,24 


10,06 


A 9 


10 


11,56 


11,28 


10,17 


10,17 


A l2 


20 


13,84 


14,56 


21,21 


20,97 


A 13 


20 


22,65 


22,67 


21,28 


21,00 


A 14 , 


20 


14,14 


15,85 


21,32 


20,95 


A 17 


20 


19,99 


21,56 


21,47 


20,99 


Ais 


20 


21,17 


20,05 


21,47 


20,99 


A19 


20 


24,90 


21,74 


21,48 


20,96 
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Figure 1: Computational domain divided into 25 subdomains. 
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Figure 2: Generalization results of rate of emission of pollutant (in gm 3 s x ) for noiseless 
data using a = 0.05. (a) True model; (b) Quasi-Newton [22]; (c) PSO [18]; (d)6 and 12 
neurons hidden layer; (e) 7 and 8; (f) 15 and 30. 
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True model Quasi-Newtan Method «0.1 




(e) (f) 

Figure 3: Generalization results of rate of emission of pollutant (in gm~ 3 s~ 1 ) for noiseless 
data using a = 0.10. (a) True model; (b) Quasi-Newton [22]; (c) PSO [18]; (d)6 and 12 
neurons hidden layer; (e) 7 and 8; (f) 15 and 30. 
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